Search CORE

arXiv.org e-Print Archive

Exponential Renormalization II: Bogoliubov's R-operation and momentum subtraction schemes

Author: Collins J.
Frédéric Patras
Itzykson C.
Kurusch Ebrahimi-Fard
Mackey G. W.
Smirnov V. A.
Zavialov O. I.
Zimmermann W.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2012
Field of study

This article aims at advancing the recently introduced exponential method for renormalisation in perturbative quantum field theory. It is shown that this new procedure provides a meaningful recursive scheme in the context of the algebraic and group theoretical approach to renormalisation. In particular, we describe in detail a Hopf algebraic formulation of Bogoliubov's classical R-operation and counterterm recursion in the context of momentum subtraction schemes. This approach allows us to propose an algebraic classification of different subtraction schemes. Our results shed light on the peculiar algebraic role played by the degrees of Taylor jet expansions, especially the notion of minimal subtraction and oversubtractions.Comment: revised versio

Crossref

HAL Descartes

Video Summarization Using Deep Neural Networks: A Survey

Author: Adamantidou E
Apostolidis E
Metsai AI
Mezaris V
Patras I
Publication venue
Publication date: 01/01/2021
Field of study

Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization. After presenting the motivation behind the development of technologies for video summarization, we formulate the video summarization task and discuss the main characteristics of a typical deep-learning-based analysis pipeline. Then, we suggest a taxonomy of the existing algorithms and provide a systematic review of the relevant literature that shows the evolution of the deep-learning-based video summarization technologies and leads to suggestions for future developments. We then report on protocols for the objective evaluation of video summarization algorithms and we compare the performance of several deep-learning-based approaches. Based on the outcomes of these comparisons, as well as some documented considerations about the suitability of evaluation protocols, we indicate potential future research directions.Comment: Journal paper; Under revie

arXiv.org e-Print Archive

Recommended from our members

A deep generic to specific recognition model for group membership analysis using non-verbal cues

Author: Gunes H
Mezaris V
Mou W
Patras I
Tzelepis C
Publication venue: Image and Vision Computing
Publication date: 03/10/2018
Field of study

Automatic understanding and analysis of groups has attracted increasing attention in the vision and multimedia communities in recent years. However, little attention has been paid to the automatic analysis of the non-verbal behaviors and how this can be utilized for analysis of group membership, i.e., recognizing which group each individual is part of. This paper presents a novel Support Vector Machine (SVM) based Deep Specific Recognition Model (DeepSRM) that is learned based on a generic recognition model. The generic recognition model refers to the model trained with data across different conditions, i.e., when people are watching movies of different types. Although the generic recognition model can provide a baseline for the recognition model trained for each specific condition, the different behaviors people exhibit in different conditions limit the recognition performance of the generic model. Therefore, the specific recognition model is proposed for each condition separately and built on the top of the generic recognition model. We conduct a set of experiments using a database collected to study group analysis while each group (i.e., four participants together) were watching a number of long movie segments. The proposed deep specific recognition model (44%) outperforms the generic recognition model (26%). The recognition of group membership also indicates that the non-verbal behaviors of individuals within a group share commonalities

City Research Online

Apollo (Cambridge)

AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

Author: Adamantidou E
Apostolidis E
Metsai A
Mezaris V
Patras I
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that incrementally leads to the selection of the video key-fragments, and their choices at each step of the game result in a set of rewards from the Discriminator. The designed training workflow allows the Actor and Critic to discover a space of actions and automatically learn a policy for key-fragment selection. Moreover, the introduced criterion for choosing the best model after the training ends, enables the automatic selection of proper values for parameters of the training process that are not learned from the data (such as the regularization factor σ). Experimental evaluation on two benchmark datasets (SumMe and TVSum) demonstrates that the proposed AC-SUM-GAN model performs consistently well and gives SoA results in comparison to unsupervised methods, that are also competitive with respect to supervised methods

VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos.

Author: Apostolidis EE
Collyda C
Markatopoulou F
Mezaris V
Patras I
Pournaras A
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2017
Field of study

This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each fragment by evaluating the existence of a number (several hundreds) of high-level visual concepts in the keyframes extracted from these fragments. Through the analysis the tool enables the identification and labeling of semantically coherent video fragments, while its user interfaces allow the discovery of these fragments with the help of human-interpretable concepts. The integrated state-of-the-art video analysis technologies perform very well and, by exploiting the processing capabilities of multi-thread / multi-core architectures, reduce the time required for analysis to approximately one third of the video’s duration, thus making the analysis three times faster than real-time processing

A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization

Author: Adamantidou E
Apostolidis E
Metsai AI
Mezaris V
Patras I
the 1st International Workshop
Publication venue: ACM Press
Publication date: 01/01/2019
Field of study

In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization. Our starting point is the SUM-GAN model, which creates a representative summary based on the intuition that such a summary should make it possible to reconstruct a video that is indistinguishable from the original one. We build on a publicly available implementation of a variation of this model, that includes a linear compression layer to reduce the number of learned parameters and applies an incremental approach for training the different components of the architecture. After assessing the impact of these changes to the model’s performance, we propose a stepwise, label-based learning process to improve the training efficiency of the adversarial part of the model. Before evaluating our model’s efficiency, we perform a thorough study with respect to the used evaluation protocols and we examine the possible performance on two benchmarking datasets, namely SumMe and TVSum. Experimental evaluations and comparisons with the state of the art highlight the competitiveness of the proposed method. An ablation study indicates the benefit of each applied change on the model’s performance, and points out the advantageous role of the introduced stepwise, label-based training strategy on the learning efficiency of the adversarial part of the architecture

Crossref

HyperReenact: one-shot reenactment via jointly learning to refine and retarget faces

Author: Argyriou V
Bounareli S
International Conference on Computer Vision
Patras I
TZELEPIS C
Tzimiropoulos G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/10/2023
Field of study

In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes. We make the code and the pretrained models publicly available at: https://github.com/ StelaBou/HyperReenact